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X  Introduction 

This  document  summarizes  the  research  conducted  on  the  AFOSR-suppoTted 
project  “Search  Control  for  Automatic  Plan  Generation”,  Contract  F49620-96- 
1-0403  during  the  period  between  Aug.  1, 1996  and  Jan.  31, 1998  (18  months). 
The  goal  of  the  project  has  been  to.  analyze  the  effectiveness  of  alternative 
sear  ch  control  strategics  for  automatic  plan  generation  and  to  investigate  the 
interactions  between  search  control  strategies  and  other  aspects  of  the  planning 
architecture.  Our  efforts  were  focused  on  developing  strategics  for: 

•  search  control  in  partial-order  causal-link  planning; 

•  search  control  in  conditional  planning; 

•  monitor-establishment  in  dynamic  planning,  with  an  emphasis  on  the 
interaction  between  monitoring  and  the  efficiency  of  planning. 

In  addition,  early  in  the  project  we  completed  some  relevant  work  that  had 
been  begun  prior  to  the  project  start-date,  involving  search  control  for  planners 
operating  in  domains  in  which  actions  have  explicit  costs  associated  with  them. 

This  report  is  organized  around  these  topics.  We  briefly  describe  the  work 
we  did  on  each  of  these  topics,  followed  by  a  list  of  project-sponsored  publi¬ 
cations,  which  provide  more  details  of  the  work.  Copiee  of  these  publications 
are  included  as  an  appendix  to  this  report. 

2  Search  Control  for  Partial-Order  Causal  Link 
Planning 

Much  of  the  current  research  in  plan  generation  centers  on  partial-order  causal 
link  (POOL)  algorithms,  which  descend  from  McAllester  and  Rosenblitt’s  {9] 
SNLP  algorithm.  POOL  planning  involves  searching  through  a  space  of  partial 
plans,  where  the  successors  of  a  node  representing  partied  plan  P  are  refine¬ 
ments  of  P,  As  with  any  search  problem,  POCL  planning  requires  effective 
search  control  strategies. 

In  POCL  planning,  search  control  has  two  components.  The  first,  node 
selection ,  Involves  choosing  which  partial  plan  to  refine  next.  Once  a  partial 
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plan  has  been  selected  for  refinement,  the  planner  must  then  perform  flaw  se¬ 
lection,  which  involves  choosing  either  a  threat  to  resolve  or  an  open  condition 
to  establish. 

Over  the  past  few  years,  several  studies  have  compared  the  relative  ef¬ 
ficiency  of  alternative  flaw  selection  strategies  for  POOL  planning  and  their 
extensions  [11,  8, 13, 6, 18].  These  studies  have  been  motivated  at  leant  in  part 
by  a  tension  between  the  attractive  formal  properties  of  the  POOL  algorithms, 
and  the  limitations  in  putting  them  to  practical  use  that  result  from  their  rel¬ 
atively  poor  performance.  Tb  date,  the  POOL  algorithms  cannot  match  the 
efficiency  of  the  60-called  industrial-strength  planners  such  as  SIPE  [16,  17] 
and  O-Plan  [4, 14].  Flaw  selection  strategy  has  been  shown  to  have  a  signifi¬ 
cant  effect  on  the  efficiency  of  POCL  planning  algorithms,  and  thus  researchers 
have  viewed  the  design  of  improved  flaw  selection  strategics  as  one  means  of 
making  POOL  planning  algorithms  more  practical 

In  the  current  project,  we  completed  an  extensive  experimental  study  of  the 
relative  performance  of  the  main  control  strategies  that  have  been  proposed 
in  the  prior  literature  for  partial-order  causal-link  planning.  Our  results  arc 
presented  in  [12],  in  which  we  review  the  literature  on  flaw  selection  strategies, 
and  present  new  experimental  results  that  generalise  the  earlier  work  and  ex¬ 
plain  some  of  the  discrepancies  in  it.  In  particular,  we  describe  the  Least-Cost 
Flaw  Repair  (LCFR)  strategy  developed  and  analyzed  by  Joslin  and  Pollack 
[8],  and  compare  it  with  other  strategies,  including  Gerevini  and  Schubert’s 
ZLIFO  strategy  [6].  LCFR  and  ZLIFO  make  very  different,  and  apparently 
conflicting  claims  about  the  most  effective  way  to  reduce  search-space  size  in 
POCL  planning.  We  resolve  this  conflict,  arguing  that  much  of  the  benefit 
that  Gerevini  and  Schubert  ascribe  to  the  LIFO  component  of  their  ZLIFO 
strategy  is  better  attributed  to  other  causes. 

More  specifically,  we  showed  that  neither  the  LCFR  nor  ZLIFO  flaw  selec¬ 
tion  strategy  consistently  generates  smaller  search  spaces,  but  that  by  com¬ 
bining  LCFR’s  least-cost  approach  with  the  delay  of  separable  threats  that  is 
included  in  the  ZLIFO  strategy,  we  obtain  a  strategy— LCFR-DSep  -  whose 
space  performance  was  nearly  always  as  good  as  the  better  of  LCFR  or  ZLIFO 
on  a  given  problem.  We  therefore  concluded  that  much  of  ZLIFO’s  advantage 
relative  to  LCFR  is  due  to  its  delay  of  separable  threats  rather  than  to  its 
use  of  a  LIFO  strategy.  Although  we  were  unable  to  resolve  the  question  of 
whether  least-cost  selection  i6  required  for  unforced,  as  well  as  forced  flaws, 
we  found  no  evidence  that  a  LIFO  strategy  fox  unforced  flaws  was  better.  On 
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the  other  hand,  separable-threat  delay  is  dearly  advantageous. 

We  also  considered  the  question  of  computation  time,  and  showed  that  of¬ 
ten  LCFR-DSep  only  requires  computation  time  comparable  to  that  of  ZLIFO. 
LCFR-DSep  can  therefore  be  seen  as  paying  for  its  own  computational  ovor- 
head  by  its  search-space  reduction. 

These  conclusions,  however,  are  tempered  by  the  fact  that  for  certain  dus¬ 
ters  of  problems,  our  combined  strategy,  LCFR-DSep,  does  not  generate  min¬ 
imal  search  spaces.  In  sum,  as  a  result  of  our  experiments  we  now  understand 
the  reasons  that  LCFR  and  ZLIFO  perform  the  way  they  do,  and  how  to 
combine  the  best  features  of  both  to  create  good  default  strategies  for  POOL 
planning.  At  the  same  time,  it  is  clear  that  certain  domain-dependent  char¬ 
acteristics  sudi  as  those  we  identified  in  several  of  the  domains  we  studied 
must  still  be  taken  into  account  in  settling  on  a  flaw  selection  strategy  for  any 
particular  planning  domain. 

3  Search  Control  for  Conditional  Planning 

Conditional  planning  is  an  important  extension  to  traditional  planning.  Con¬ 
ditional  planners  allow  for  conditional  actions  with  multiple  possible  out¬ 
comes  and  for  sensing  actions  that  allow  agents  to  determine  the  current 
state[l,  5,  7,  3].  A  key  question  in  conditional  planning  is:  how  many,  and 
which  of  the  possible  execution  failures  should  be  planned  for?  One  cannot, 
in  general,  plan  for  all  the  failures  that  can  be  anticipated:  there  are  simply 
too  many.  But  neither  can  one  ignore  all  the  possible  failures,  or  one  will  fail 
to  produce  sufficiently  flexible  plans.  Essentially,  this  question  can  be  viewed 
as  one  of  search  control:  which  portion  of  the  plan  space  should  be  searched 
first,  to  provide  the  highest-quality  contingency  plans? 

In  the  current  project,  we  developed  Mahinur,  a  probabilistic  partial-order 
planner  that  supports  conditional  planning  with  contingency  solection;  our 
work  on  this  is  reported  in  [10].  Mahinui  implements  an  iterative  refinement 
planning  algorithm  that  identifies  the  contingencies  that  contribute  the  most 
to  the  plan’s  overall  value,  and  gives  priority  to  the  contingencies  whose  failure 
would  have  the  greatest  negative  impact.  We  concentrated  on  two  aspects  of 
the  problem,  namely,  planning  methods  for  an  iterative  conditional  planner 
and  a  method  for  computing  the  negative  impact  of  possible  sources  of  failure. 

We  conducted  experiments  with  reasoning  aboutthe  first  implementation 
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of  Mahinur,  and  compared  its  performance  to  other  probabilistic  conditional 
planners,  notably  the  C-Buridan  [5]  system,  the  best-known  alternative  for 
partial-order  contingency  planning.  Mahinur  differs  from  C-Buridan  and  other 
earlier  systems  in  explicitly  calculating  the  expected  value  of  handling  alter¬ 
native  contingencies  at  plan  time.  Our  experiments  showed  that  these  calcula¬ 
tions  result  an  a  significant  increase  in  Mahinur’s  planning  efficiency,  relative 
to  C-Buiidan.  We  are  continuing  to  work  on  the  Mahinur  system,  incorporat¬ 
ing  methods  for  reducing  the  probability  of  failure  by  adding  more  supporting 
actions,  and  implementing  a  much  larger  real-world  domain  to  use  as  the  basis 
of  extended  experimental  analyses. 

4  Monitor-Selection  in  Planning 

A  further  extension  to  planners  is  required  when  agents  in  situated  in  dynamic 
environments.  There,  a  central  challenge  is  to  be  appropriately  sensitive  to 
changes  in  its  environment.  In  general,  it  is  too  costly  to  be  responsive  to 
every  environmental  feature  that  the  agent  knows  about.  On  the  other  hand, 
an  agent  that  is  completely  unresponsive  may  fail  to  take  advantage  of  cir¬ 
cumstances  that  can  improve  its  plans  and/or  shorten  its  planning  time  con¬ 
siderably.  The  need  to  balance  sensitivity  to  environmental  change  against 
appropriate  stability  of  the  plans  being  formed  is  strongly  reminiscent  of  the 
ideas  that  led  to  the  design  of  the  IRMA  architecture  and  filtering  strategy  in 
our  earlier  work  [2]. 

In  recent  work  on  this  project,  we  have  introduced  the  idea  of  rationale- 
based  monitoring ,  reported  in  [15].  In  this  approach,  planning  is  strongly 
identified  as  a  decision  making  process  and  the  planning  system  records  the 
rationale  for  the  choices  it  makes.  Even  when  planning  consists  mainly  in  task 
decomposition,  it  will  typically  involve  choosing  between  alternatives,  and  the 
reasons  for  those  choices  constitute  the  plan  rationale.  The  agent  can  then 
focus  its  attention  on  those  changes  in  the  environment  that  would  affect  the 
truth-value  of  the  planning  rationale. 

A  novel  aspect  of  our  approach  is  that  we  not  only  monitor  features  of  the 
world  that  affect  the  current  plan,  but  also  features  of  the  world  that  played  a 
role  in  the  decision  to  select  that  plan  over  alternative  possibilities.  We  main¬ 
tain  two  sets  of  monitors:  plan-based  and  alternative-based.  Every  time  the 
agent  needs  to  make  a  decision  among  alternatives,  it  deliberates  and  selects 
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a  particular  plan.  The  selected  plan  gives  rise  to  the  plan-based  monitors. 
At  the  same  time,  the  alternatives  considered  give  rise  to  alternative-based 
monitors.  As  the  world  state  is  dynamically  changing,  the  agent  remembers 
alternatives  that  it  judged  less  valuable,  monitoring  the  world  state  to  see  if 
that  judgement  should  be  changed.  We  implemented  a  prototype  version  of 
rationale-based  monitoring  and  conducted  preliminary  experiments  showing 
that  it  can  lead  to  improved  plans  without  significant  overhead. 
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1The  first  two  papers  listed  were  completed  prior  to  the  start  date  of  the  current  contract, 
and  thus  do  not  acknowledge  this  contract.  However,  they  are  both  within  the  scope  of  the 
current  effort,  and  follow-on  work,  which  was  reported  when  the  papers  were  presented  at 
the  conference,  was  done  during  the  current  contract  period. 
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